- 
                Notifications
    You must be signed in to change notification settings 
- Fork 665
Find keywords using perfect hashing #1590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This updates the keyword lookup algorthm to use the phf crate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for thsi @davisp
| ]; | ||
| }; | ||
| pub fn lookup(keyword: &str) -> Keyword { | ||
| let keyword = keyword.to_ascii_uppercase(); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are going to optimize lookup I think we should also avoid this call to to_ascii_upprcase as it allocates / copies the string which I believe is significiantly more costly than some comparisons
https://doc.rust-lang.org/std/ascii/trait.AsciiExt.html#tymethod.to_ascii_uppercase
https://doc.rust-lang.org/std/primitive.str.html#method.to_uppercase
Since all keywords are ASCII, what if we did the binary search character by character
Here is some brain 🤮 :
let first_char = string.first().to_upper() 
// find range of keywords within the
let range = binary_search(keywords, first_char, 0);
// if not unique location, proceed to second character, etc
...
| use std::path::Path; | ||
|  | ||
| fn read_keywords() -> Vec<(String, Option<String>)> { | ||
| let path = Path::new("src").join("keywords.txt"); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is quite clever
| As much as I love not rewriting code, I am quite loath to take on a new dependency in this crate. My concerns are 
 That being said,  Are you wiling to try the "compare without allocating approach" described in #1591 (comment)? | 
| Also an open question regarding maintenance of the phf crate rust-phf/rust-phf#318 | 
| Marking as draft as I think this PR is no longer waiting on feedback. Please mark it as ready for review when it is ready for another look | 
| Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days. | 
This updates the keyword lookup algorthm to use the phf crate.
This is one possible approach to solving #1588 at the cost of pulling in a new dependency on phf. I will note that phf is rather small and is being used in a
no_stdconfiguration, but it is a new dependency which may be enough to not go this route.Local measuring with the sqlparser_bench shows roughly a 5% speedup. I'll open a second PR for comparison with my other approach to just scope the binary search to words starting with the same first letter.